Multilevel Optimization of Pipelined Caches

نویسندگان

  • Kunle Olukotun
  • Trevor N. Mudge
  • Richard B. Brown
چکیده

This paper formulates and shows how to solve the problem of selecting the cache size and depth of cache pipelining that maximizes the performance of a given instruction-set architecture. The solution combines trace-driven architectural simulations and the timing analysis of the physical implementation of the cache. Increasing cache size tends to improve performance but this improvement is limited because cache access time increases with its size. This trade-off results in an optimization problem we referred to as multilevel optimization, because it requires the simultaneous consideration of two levels of machine abstraction: the architectural level and the physical implementation level. The introduction of pipelining permits the use of larger caches without increasing their apparent access time, however the bubbles caused by load and branch delays limit this technique. In this paper we also show how multilevel optimization can be applied to pipelined systems if softwareand hardware-based strategies are considered for hiding the branch and load delays. The multilevel optimization technique is illustrated with the design of a pipelined cache for a high clock rate MIPS-based architecture. The results of this design exercise show that, because processors with pipelined caches can have shorter CPU cycle times and larger caches, a significant performance advantage is gained by using two or three pipeline stages to fetch data from the cache. Of course, the results are only optimal for the implementation technologies chosen for the design exercise; other choices could result in quite different optimal designs. The exercise is primarily to illustrate the steps in the design of pipelined caches using multilevel optimization, however, it does exemplify the importance of pipelined caches if high clock rate processors are to achieve high performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Citation : O . Olukotun , T

This paper formulates and shows how to solve the problem of selecting the cache size and depth of cache pipelining that maximizes the performance of a given instruction-set architecture. The solution combines trace-driven architectural simulations and the timing analysis of the physical implementation of the cache. Increasing cache size tends to improve performance but this improvement is limit...

متن کامل

190-MHz CMOS 4-Kbyte Pipelined Caches

In this paper we describe the design and implementation of a 190-MHz pipelined 4-Kbyte instruction and data cache. The caches are designed in 1.0-μm CMOS and measure 0.78 x 0.47 cm2. This paper describes the microarchitecture, cache timing, circuit implementation, and layout of both the instruction and the data cache. The key features of these caches are pipelined execution and the use of dynam...

متن کامل

Direct-mapped versus set-associative pipelined caches

As the tag check may be executed in a speciic pipeline stage, cache pipelining allows to reach the same processor cycle time with a set-associative cache or a direct-mapped cache. On a direct-mapped cache, the data or the instruction owing out from the cache may be used in parallel with the tag check. When using a pipelined cache, such an optimistic execution results in load and branch delays o...

متن کامل

High Performance Cache Architectures to Support Dynamic Superscalar Microprocessors

Simple cache structures are not sufficient to provide the memory bandwidth needed by a dynamic superscalar computer, so more sophisticated memory hierarchies such as non-blocking and pipelined caches are required. To provide direction for the designers of modern high performance microprocessors, we investigate the performance tradeoffs of the combinations of cache size, blocking and non-blockin...

متن کامل

Using Time Skewing to Eliminate Idle Time due to Memory Bandwidth and Network Limitations

Time skewing is a compile-time optimization that can provide arbitrarily high cache hit rates for a class of iterative calculations, given a sufficient number of time steps and sufficient cache memory. Thus, it can eliminate processor idle time caused by inadequate main memory bandwidth. In this article, we give a generalization of time skewing for multiprocessor architectures, and discuss time...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEEE Trans. Computers

دوره 46  شماره 

صفحات  -

تاریخ انتشار 1997